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Abstract 


Many businesses are burdened with the need to train students for the job 
instead of finding them prepared for it. Few business leaders feel that colleges 
prepare students for future jobs from day one. It can be a challenge for colleges 
to determine if their curricula meet the industry needs. Mapping industry needs to 
academic courses can be advantageous to both parties as it will allow colleges to 
be aligned with the industry needs and accordingly satisfy those needs and will 
allow the industry to hire better prepared graduates. In an attempt to address 
this, a system prototype that uses a collection of job descriptions from various 
sites and syllabi of college courses as the input knowledge was developed. The 
primary goal of the system is to help students to find courses that would be most 
beneficial in providing them with the skills that match a given job description. The 
secondary goal is to help faculty to quickly find out information about current 
skills and tools covered in the existing courses, which accordingly can help them 
to make decisions about their future courses to satisfy the inductry needs. The 
system was developed using the Natural Language Toolkit (NLTK) and the 
Python programming language. Two sets of keywords were used to test the 
system; the first one is the most common keywords and the second one includes 
the most and least common keywords. Results from testing the system 
demonstrate that using the former set of keywords allowed for better results with 
precision equal to 55% and recall equal to 39.61%. 


Keywords: Natural Language Processing; Artificial Intelligence; Natural 


Language Toolkit; Lemmatization; Course Development; Text Matching 
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Chapter 1: Introduction 


1.1 Problem Statement 


One difficulty that colleges face today is the competition with online 
platforms that offer free tutorials and courses. A person can go and look on the 
internet and find a tutorial or a book that will teach them a certain skill without the 
need for attending a college. That brings up the major issue that is how can the 
college make a difference and help students better prepare for the workplace. 
Employers have begun to remove the requirement for college on their job 
description due to the skills that can be learned or taught outside the college 
setting. Employers even argue that college education is not preparing students 
well enough for the workplace no matter how much the students think it is. 
Jaschik does a good job of further explaining this in his article [1]. Although this 
issue is widespread and growing with time, this project provides a tool to assist in 


this problem. 


1.2 Goals 


The primary goal was to build a system to be used by students. The 
system, named Curtus, would provide students with a ranked list of classes to 
take based on a job description. Curtus uses natural language processing to: 

1. help students pick the right classes that they need for a particular job; 


2. help colleges set up their courses in the best way possible for students to 


be ready for jobs; 





3. help employers know just how well the college is preparing students for 
their needs. 

A student should be able to plug in a job description and find classes that will 
directly teach them what they need for that job. If the job requires some 
knowledge in a programming language like C#, it should recommend them 
classes that teach C# as well as other as other courses that practice the 
language. Employers can use that same tool to know about available courses at 


a college and if they are preparing students under their particular job category. 


1.3 Natural Language Processing (NLP) 


Natural Language Processing (NLP) is a field that deals with 
understanding and interpretation of the human language by computers. NLP falls 
into a category of artificial intelligence (Al) and has a variety of applications. 
Today NLP is seen everywhere inside of smartphones, game systems, or even in 
some household appliances [2]. One of its first big appearances was back in the 
1950s when it was used to translate Russian text into English text [3]. At the time 
however it was not overly successful and ten years of added research on the 
topic did not yield much progress. Advancements in the field of Al and Machine 
Learning has led to the reemerged of the field into what we see now [3]. 

The breadth of application that NLP has now is astounding [4]. Some of the 
applications of NLP include: Sentiment Analysis [5], Text Summarization [6], 
Information Extraction [7], Topic Segmentation [6], Question Answering [8], Part 


of Speech Tagging (POS) [9], Parsing [7], Translation [10], and Argumentation 


Mining [11]. 





The most common places that people find NLP useful is in autofill [4] 
whether it is from a web browser or typing a text or giving voice commands to a 
computer. Small tasks such as finding misspelled words or incorrect use of words 


based on the context in a document can also take advantage of NLP. 


1.4 NLP and Text Matching 


Text Matching which is a term that describes finding how much one text 
matches another text. It is commonly used in searching for web pages. Matching 
is done at three different levels being word matching, phrase matching, and 
sentence matching. Curtus uses word matching to find classes that contain the 


most skills that a job is looking for. 


1.5 Contribution 


A final version of Curtus would be able to add a new level of comparison 
to the area of text matching. Most of the research done in the area of text 
matching focus on how similar two texts are and the different methods that can 
be achieved (This is further explained in Chapter 2). For being able to 
recommend classes that are most relevant to a job, there is one more level of 
detail that is needed for the matching. Unlike a search engine where words and 
phrases are typed in manually, Curtus uses job descriptions as its knowledge 
base. As much of text matching research focus on the searching portion, this 
project’s focus is on knowing the importance of each of the inputs before the text 


matching. Curtus needs to be able to decide which keywords are the most 


important and then provide weighted values to those words. 





1.6 Challenges 


Throughout the project, there were multiple barriers that were faced, and 
challenges met. One of the challenges was the amount of time spent creating 
human-traced results to be able to evaluate the system. With over 100 syllabi 
and six job descriptions, each was hand traced on paper and compared. 
Another challenge was the fact that the used syllabi do not have the same file 
type or extension. Extra unforeseen work was done to get all of the information 
into one shared style while retaining the flexibility to easily add more courses to 
Curtus. 

While implementing the system, it was a challenge to be able to assemble 
the system in such a way that it could be quickly tested and debugged. A custom 
shell ended up being made to run commands to import and export files, work with 
different test sets, view underlying data, and modify the current session. 

The final challenge faced for this project was to decide on a good way to 
properly display the results of the system; finding a good way to display/visualize 


hundreds of results in one time. 


1.7 Thesis organization 


This thesis is organized as follows: the next chapter focuses on literature 
review on related projects and curricula development. Chapter 3 focuses on the 
design of Curtus and how data is passed through it. Chapter 4 discusses in detail 


the implementation of the system and programming aspects. Chapter 5 goes 


over the evaluation methods as well as the test data used by the system. Lastly, 





Chapter 6 discusses the results obtained from testing Curtus, conclusion and 


future work. 





Chapter 2: Background 
2.1 Other Work in NLP 


There is a lot of research being done in NLP out there. Many works 
focused on the types of data that are extracted through text matching [12], and 
on using methodologies from other areas for text matching [13]. Other 
application_based papers showed the application for NLP and text matching in 
projects like recommending articles in scientific communities [14], and identifying 
matching citations from papers [15]. Two primary research papers acted as the 
basis for this research and prior to the implementation of Curtus were information 
retrieval [12] and text matching as image recognition [16]. 

The information retrieval paper provided great insight into the different types 
of information that can be extracted. One item in particular that they discuss was 
providing weights by how frequently they appear in the text; the system looks at a 
question or statement and based on the words provided, across a body of 
knowledge, it would rate the question or statement based on the frequency of 
how often a word appears. The phrase that has the highest score is given back 
as the answer. That concept of applying weights and score has been widely used 
across the field. 

The second paper, text matching as image recognition, focused on text 
matching using concepts from image recognition to find patterns in text. This 
method can be used to find similar phrases and sentences without the need for 


both of them to have the same words but instead, share the same meaning and 


structure. 





2.2 Curricula Development 


In curricula development, NLP, as well as other forms of Al, have already 
assisted in many different ways. Natural language has been used in ranges from 
helping within a classroom to helping across multiple classrooms. One 
application known as Language Muse helps to assist teachers in building 
instruction and lesson plans for students that are learning English [10]. It used 
NLP to provide immediate feedback on their work. NLP was used for 
summarization and translation of English to Spanish. While this tool helps to 
restructure a class, Curtus focuses more on a wider range of courses as a whole 
within a college setting. 

Work that has been done in the college setting under the same scope as 
Curtus was a knowledge map tool built for evaluating medical curricula 
documents [8]. The tool was used to be able to extract important words or 
phrases from the documents. This is highly similar to the final goal of Curtus that 
is to be able to identify the most important keywords from job descriptions. With 
the other project being similar to Curtus, particularly under the area of 
measurements, the medical curricula project did not provide any information of 
an acceptable error range that can gauge Curtus’ evaluation. 

Curtus aims to provide assistance at a higher level than the other two p. 
Similar to the tool used for Medical Curricula [8], Curtus extracts keywords from 
job descriptions that it considers the most important. The differences start to 


appear when Curtus has to give a value that can define how important a word is. 


These values are used to find courses whose syllabi are of most relevance to the 





keywords. Using the values of the keywords and the frequencies in which they 
appear in the syllabi, Curtus would provide the courses that apply in a ranked 
order based on how important the skills are that they offer and how many skills 
they provide. 

Developing these courses can be a major challenge as shown by the 
amount of work being done to aid in this process [10] [8]. Curtus is a tool that can 
be used in this area to provide students with classes that can prepare students 
for jobs. One primary example that it would be able to assist with is the Illinois 
Institute of Technology’s attempt to increase the real-world aspect of their 
computer science program [17]. Colleges like Illinois Institute would be able to 


use this system to evaluate the added courses and make sure that they are 


providing them with the right skills. 





Chapter 3: Curtus Architectural Design 


3.1 System Design 
A collection of course syllabi taught at the TSYS School of Computer 


Science at CSU and job descriptions from internet job posting sites were used to 
achieve the thesis goals. Each of the courses syllabi provides details describing 
the skills learned from the course and topics covered. A database was built using 
keywords collected from 102 syllabi stored under their original format. Job 
descriptions were collected from public job posting sites like Glassdoor and 
LinkedIn, but not necessarily stored in a database, and were used mainly for 
testing purposes. The full steps for this implementation are displayed in Error! 


Reference source not found.. 















Find job 
descriptions that 
can be used for 
test cases 














Trace by paper 
how comparisons 
will be made 


Retrieve syllabi for 
related classes 


Implement NLTK 
to be able to 
process the data 


Test and evaluate 
the results 


Build database out 
of syllabi 


Figure 1: Steps in the project implementation 


The implementation goes through the following steps... 
1. Retrieve syllabi for related classes 
2. Find job descriptions that can be used from test cases 
3. Trace by paper how comparisons will be made 


4. Build a database out of syllabi 


5. Implement NLTK to be able to process the data 
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6. Test and evaluate the results 
The system, however, is not made to evaluate a college program; it can 
just tell someone what classes they should take. The system is only able to 
provide recommendations based on the information it has so the classes it 
provides may be a good fit for a job description compared to the other classes. 
However, they might not be the best classes to take. It is up to the user to judge 


how good a course is or how well courses are set up for a certain job. 


3.2 System Architecture 


The system architecture comprises of four layers: the first layer has the 
user interface, the second layer has the preprocessor for data input, the third 


layer has a keywords extraction engine and finally, an output layer, see Figure 2. 





Job Description 














Keyword 
Extraction Engine 






Preprocessor Relevant Courses 





User 
Interface 


User 


Syllabi 


Figure 2: Curtus Architecture 


The user interacts with the system through the text-based user interface where 
he can upload a job description file. That file is then preprocessed along with the 


syllabi that are already stored in the system. When they are preprocessed, 


Curtus takes the different file types and pulls out the text into local storage as 





11 


regular text. That text is then parsed through to remove any stop words and 
apply lemmatization to put all of the words in the text into their base form. 
Removing stop words assures that common words like ‘and,’ ‘the,’ and ‘to’ would 
not be considered as keywords. Lemmatization removes some strain on the 
system for making sure all words are in the same tense and sets them to 
singular. Words like mice and syllabi would be changed to mouse and syllabus. 
After the preprocessing phase, the keyword extraction engine extracts the 
important keywords and gives back the results on each syllabus. Those results 
are used to sort the syllabi in order based on what Curtus considers the most 
important (syllabi that contain the highest number of matches when compared to 


the job description are ranked higher). The ranked courses are given back to the 


user as output. 
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Chapter 4: System Development 


4.1 Libraries Used 


Many libraries were used in building this project. Error! Reference 
source not found. provides a list of the libraries used as well as the way in 


which they were used in order to give credit to them. 


Table 1: Libraries Used 


NLTK Used to be able to extract keywords from the text by 
frequency. It also provided tools to be able to filter out 
common words such as and, in, to, the, etc. 

Codecs_ Used to be able to read in syllabi that are in HTML format 
and retrieve only the content from the files. 

PyPDF2 Used to be able to read in syllabi that are in PDF format and 
retrieve only the content from the files. 

Codecs Used to be able to read in syllabi that are in PDF format 
intended for web pages and retrieves only the content from 
the file 


4.2 Natural Language Toolkit 
NLTK, Natural Language Toolkit, is a tool that was used on this project. 
After looking through multiple different tools available for natural language 
processing, NLTK came out to be the best fit and most well rounded of the tools 
that had been found. NLTK has many pros including: 
e tis open source, so it is very easy to add onto as well as being well 
refined 
e It has a book that gives detailed instruction on downloading, installing, and 


importing the library [18] 


e tis widely used for research [19] 





4.3 System Implementation 


The system is implemented using the following steps: 


The algorithms for the keyword matching were written in such a way 
that keywords are sorted alphabetically using merge sort. Comparing 
the files is an O(n) time result vs. O(n?) as a result of the keywords 
being presorted. This code is shown and discussed in Section 4.4. 
Using a merge sort algorithm as shown under Appendix A, this assures 
that the program sorts the information with time complexity O(n log n). 
Different file types are covered to make the system as flexible as it can 
be. This code is displayed and discussed in Section 4.5. 

Nouns are parsed out and sorted in such an order of most frequent 
occurrence. Also, certain words of interest were noted as well. This 


code is shown and discussed in Section 4.6. 


4.4 Comparing the sorted files 


This algorithm works under the assumption that the two lists passed to it 


are sorted. As a result, it can go through in a clean sweep to find any matching 


pairs. Figure 3 shows a visual representation of the order in which it is comparing 


the lists. It is worth noting that if the same list, provided in the example, was 


compared with them while unsorted, the complexity would go from 18 to 81. That 


scaled up when it comes to comparing 150 words resulting in the difference 


between 300 comparisons and 22,500 comparisons. That multiplied across six 


job description test cases and 102 syllabi to be 13,770,000 comparisons, with 
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unsorted lists. With sorted lists, the worst number of comparisons is 183,600 


only. 










Echo | Foxtrot] Hotel 


Figure 3: Comparison Of Sorted Lists 
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4.5 File Types 


Thanks to the libraries imported, any syllabus can be added to the system 


with the file types: Docx, Txt, HTML, and PDF. Usage of each of the libraries as 


well as the implementation is provided in Appendix A. 
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4.6 Keyword Extraction 


When extracting the keywords, Curtus goes through several steps to 
make sure the data is ready to work with. The system starts by modifying the 
data to be in a more simplified manner through Lemmatization. “Lemmatization 
usually refers to doing things properly with the use of a vocabulary and 
morphological analysis of words, normally aiming to remove inflectional endings 
only and to return the base or dictionary form of a word, which is known as the 
lemma’ [13]. 

Curtus then removes words considered as stop words (a, an, in ...etc.) and 
removes any symbols from the document. Once all of the extra words are 
removed, it takes a certain set of what it considers important keywords. Under 
Appendix A, the implementation for 75:75 keywords are shown. 75:75 represents 


the program taking the 75 most common keywords and the 75 least common 


keywords and using those for processing the information. 
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Chapter 5: System Evaluation 


5.1 Overview 


Since we were not able to find any other systems in the area of this work, 
a manual evaluation had to be done to set benchmark results that Curtus’ output 
can be compared to. The system is expected to compare existing courses to 6 
job descriptions and provide recommendations for a good set of courses that 
map to each of the job descriptions. A plausible evaluation is intended where 


precision and recall are calculated for each of the input job descriptions. 


5.2 Test Data 


Job descriptions were selected based on the projection that they are good 
fits in the sense that it is obvious for a human to easily map each of them to a set 
of courses. The target of the evaluation to see if Curtus can provide the same set 
of courses, that is Curtus is a plausible system. In addition, Curtus would provide 
the courses as a ranked set with the most relevant ones displayed on the top. 
Based on that, jobs within the following areas were chosen: 

e Information Technology 
e Cyber Security 

e Simulations 

e Game Programming 


e Web Developer 


e Java Programmer 





% 


5.3 Human Evaluation 


The researcher manualy went over all the testing job descriptions and 
came up with the best set of courses that map to the job describtions. These sets 
of courses provide the bench mark data that Curus results will be compared to. 
Each of the syllabi, course descriptions, and instructor descriptions were looked 
through and evaluated by hand to see what skills are taught in each of them 
which is further discussed in Section 5.4. 

Curtus was evaluated and measured by the following metrics: 1. how many 
of the wanted keywords were found, 2. how many total keywords were found, 
and 3. how good the system is based on how many of the expected courses are 
provided by Curtus as recommended courses. The comparison of expected 


courses to recommended courses was measured under precision and recall. 


5.4 Hand Traced Validation Set Process 


Figure 4 below shows a scanned image of keywords from one of the job 
descriptions. The hand traced job descriptions was used as a way to evaluate the 
results obtained from Curtus. Good performance can be indicated by the system 


findings; if Curus finds the same number of keyword matches as the hand traced 


ones then the results are considered good. 
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Figure 4: Scanned Hand Traced Document (Java Dev) 


The following steps were applied manually to each of the job descriptions to 
extract keywords and use them for matching the syllabi in the test data: 
1. Cross out any stop words 


2. Note the count of each word in the document 


a. Mark a one above the first 
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b. If anext is found black out the first one and put the next number 
above it 
c. Repeat step b until the end of the document is reached 
3. List out the most common words and underline the most important words 
4. Use the most important words and search through the syllabi using them 
a. Mark the count on the edge of the page in order of course 
numbers. These numbers are used for reference in Figure 5, 
Figure 6, and Figure 8. 


5. Repeat all steps for each of the 6 Job Descriptions 


5.5 Curtus Evaluation 


In the first phase of the project, the aim was to work on the lowest level of 
required intelligence for a fully functional prototype. This step helped in 
discovering any unseen aspects of the project and bringing them to light. For file 
retrieval, Curtus uses the test data mentioned earlier. The following steps were 
applied: 

1. Extract top 150 most common keywords from each syllabus 

2. Extract top 150 most common keywords the job description 

3. Evaluate the syllabi to see which one has the most matching keyword 

4. Take the top 10 matches and further evaluate them through keyword 
weights and provide them as results sorted by valued importance. Valued 


importance was based on how many distinct keywords are found and how 


many of each of those keywords are found. 
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Repeat With Changes: 
1. Extract the top 75 most common keywords and bottom 75 least common 
keywords from each syllabus 
2. Extract the top 75 most common keywords and bottom 75 least common 
keywords from the job description 
3. Evaluate the syllabi to see which one has the most matching keywords 
4. Take the top 10 matches and further evaluate them through keyword 
weights and provide them as results sorted by valued importance. 
Top keywords are considered the most common words within a file and bottom 
keywords are considered the least common words in a file. 
Further evaluated files were ordered by keyword count and were used to 
view which keywords were most commonly found across the files. Two data sets 
were used because many of the important keywords are less common in the 


texts being used. By using the top and bottom keywords, the hope is to reach 


better results than with just using the top keywords only. 
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Chapter 6: Results, Conclusion and Future Work 


6.1 Results 


Job descriptions for a Java Developer, Information Technology, Simulation 
Engineer, Threat Analyst, Web Developer, and Game Developer were used to 
test Curtus. With each one of these, Curtus would evaluate each course and 
provide how many keyword matches are found, how many words there are in the 
file, weights for the keywords, and a weight for the course. Weights for keywords 
were assigned by how frequently a word appeared within the file. These weights 
provide another metric to sort the results. For each job description, Curtus would 
also provide a word count of each keyword that was considered from the syllabi. 

With the word count, the software was able to be refined further by revealing 
words that were not correctly filtered out by the keyword extraction. Examples 
would be words like us, we, and them. After the process of filtering out those 
words, Curtus was tested and results were collected from this set. Those results 
were then compared with a document containing hand traced forms of the job 
descriptions that marked which words should be considered the most significant 


ones in Curtus as well as which courses should be considered good fits for the 


job. The results of this comparison are shown in Figure 5. 
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Figure 5: Expected matches versus matches found for top 150 most common keywords 


In Figure 5, the blue dots represent the results from human evaluation, and the 
red dots represent the results obtained from Curtus. The closer the dots are to 
one another, the better the results. With the first set being used (150 most 
common words), Curtus ended up missing many of the prime keywords. This can 
be attributed to the fact that Curtus was simply working off of which syllabi had 
words that are more common in the English language. 

The results were further evaluated using the weights of the files. Smaller files 
would have a larger weight assigned to each keyword. Based on how many 
keywords are found in the file, this would then be multiplied by the weight to 
provide a weight for the file as a whole. Using that finalized value, the top 10 
matches were compared to the resulted ones from just counting the keywords. 
The results were certainly interesting but did not provide any better results that 


would change the previous conclusion for this set. This left the result from testing 


Curtus with this set not being a valid solution. This set did, however, help in 
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filtering out data and drawing conclusions to use for the second iteration of 
testing. 

A second set with the top 75 keywords (most common) and the bottom 75 
keywords (least common) from each of the job descriptions and each of the 
syllabi was used in the second iteration of testing. Full output for this set is 
available in Appendix C. Results for this set were evaluated based on the same 
metrics used on the 150 keywords (most common) set: the keyword matches, the 
number of words in the file, the weight for the keywords, and the weights for the 
file itself. The results were compared to the human evaluation showed better 


performance this time as shown in Figure 6. 


Matches With Keyword Set 75:75 


26 


24 ® 
22 
2-20 
o 18 e @ © @ @ ores 
s 16 @ es © eB oe 
14 S & 
12 Qi 2SD eo Gam RGD GHD © GED 
10 
0 20 40 60 80 100 120 
File Numbers 
®@ Expected Matches Resulting Matches 75:75 


Figure 6: Expected matches versus matches found with 75 most common keywords and 
75 least common keywords 


In Figure 6, the blue dots represent the results from the human evaluation, and 


the grey dot represents the results obtained from Curtus. The closer the dots are 


to one another the better the results. Many of the keywords that were not 
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considered in the first set appeared in the second set providing good results for 
Java Developer and Web Developer. The other results based on the keyword 
matching were not quite as good. 

These results were further evaluated similarly to the previous test using 
the weights of the files. The top 10 matches were compared to the human results 
showing much better results compared to using the previous set. However they 
were still not satisfactory results. An example run is shown in Figure 8. If the 
weights were used the Information Technology and Game Developer jobs had 
improved decisions while not when considering the other jobs. The results for 
Information Technology can be attributed to syllabus descriptions using terms 
that are more common to the field of computer science. The Game Developer 


results can most likely be attributed to the opposite issue of using terms that are 


not common at all. 
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Please Input a File: 

javaDey.txt 

File Found 

Evaluating File [100.00%] : DONE: ./JobDescriptions/javaDev.txt 
(‘CLASS AG", 20) 

i(*CLASS AR’, 20) 

("CLASS BC’, 20) 

(‘CLASS AB", 20) 


You should take the following classes 





File Name | Keywords | Strength | 
|CLASS_AG | 20/150 | <41.867> | 
CLASS AR | 20/158 | <41.6> | 
CLASS BC | 20/150 | <33.507> | 
CLASS AB | 20/150 | <29.107> | 
|CLASS BG | 19/150 | <42.978> | 
CLASS CA | 19/150 | <33.807> | 
|CLASS BE | 19/150 | <33.757> | 
CLASS BN | 19/150 | <31.629> | 
CLASS AU | 18/150 | <27.168> | 
| CLASS_CWl | 18/15@ | <27.168> | 


Figure 7: Active Run-End User This figure shows the results of a test run for the prototype. The 
user provides a txt file with a job description, and it provides the top 10 results sorted by the 
strength of the file. 





26 


6.2 Results Analysis 


The obtained results show that the least common words are as important 
as the most common words. Results from using both sets provide that the 
second set is far better than the first one. When comparing the two sets the 


75:75 was much closer to the expected result as shown in Figure 8. 


Keyword Set 150:0 vs Keyword Set 75:75 


Matches 
8 
@ 
@ 
§ 
8 


0 20 40 60 80 100 
File Number 


@ 150:0 Difference From Expected 75:75 Difference From Expected 
Figure 8: 150:0 and 75:75 Compared. 

Figure 8 shows how far the results strayed from the expected results. The 150:0 
represents the 150 most common keywords found in the file and the 75:75 
represents the set that used the 75 most common keywords and the 75 least 
common keywords. The lower the dots are on the graph, the better the result is. 
The first phase however was not intended to be able to have a full solution but 
instead to reveal what attributes need to be considered and what needed to be 


added. 


Even though the results based on a number of keywords were better under 


the set of 75:75 that does not answer the main question: does the system 
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recommend good classes?. Referring back to results from the human evaluation, 
each of the job descriptions was associated with a list of courses that, from a 
human perspective, are considered great courses for the job. It is worth noting 
that the list of provided courses contains 10-15 courses in no particular order. 


The results for this are displayed in Table 2. 


Table 2. Precision and Recall for Recommended Courses 


150 Set 75:75 Set 

Recall Precision F1Score_ Recall Precision F1 Score 
Java 0% 0% 0% 35.2% 60% 44.37% 
Info Tech 53% 40% 45.59% 20% 30% 24% 
Simulation 37.5% 50% 42.85% 43.75% 60% 50.6% 
Cyber Sec 37.5% 60% 46.15% 56.25% 70% 62.37% 
Web Dev 38.8% 70% 49.93% 61.1% 80% 69.28% 
Game Dev) 35.7% 50% 41.66% 21.4% 30% 24.98% 
Average NA 45% NA 55% 


Based on the list of provided courses, Curtus managed to reach an average 
of 55% precision in the 75:75 set of data. In other words, Curtus provided, on 
average, 5 to 6 classes that can be considered good classes. One important note 
on the results is that the presence of 4 to 5 classes that are not considered good 
classes does not mean those classes do not fit at all. These classes most likely 
can still provide skills applicable to the job. However this simply means that there 
may still be better courses. 

The major question that was brought up from the results was, how the 
syllabus format affect the results? In this case, it directly affects it. Many of the 
courses provide more information than just what is taught in the class. For 


example, many syllabi provide sections with course policies, attendance policies, 


...etc. Some of them list extra information about a field that they are tethered to 
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even though they might not teach those particular skills. That extra information 

can be very problematic for software like Curtus because in some cases longer 
syllabi might have only information on what is learned in the class while others 

might have information that is not part of the content learned in the class. 

An example can be if a syllabus for one course gives a great explanation of 
the targeted skills in a class, it would be considered a good match. That good 
syllabus, however, could be overshadowed by another syllabus that is longer and 
simply has more keywords available to match. With the fact that larger weights 
are assigned to short syllabi, using weights properly can be affected by this 
problem, i.e., a long syllabus that covers anything a student might learn about in 
a class, but has extra information would have a small weight. That concludes that 
a syllabus should not be diluted or devalued as a result of its length. 

One solution to this problem would be to remove any unused information 
from the syllabi (files). Sections such as attendance, policies, and grading would 
be removed from the file leaving the important information only. This also left a 
question behind: what makes a good syllabus? For Curtus to provide the best 
results from a provided syllabus, the syllabus should contain: 

e Skills learned within that class; 
e Acalendar or schedule with detailed information on what topics would be 
covered each week. 
The worst results were obtained from syllabi that: 


e contain a broad description of skills that should be gained by the course 


e contain either no calendar or undetailed calendar 
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e provides a background on the field of study presenting skills that are not 
covered within the course 
e have information copy and pasted from other courses and not correctly 
modified 
If certain syllabi were omitted from the system, the results would have been 
much better. However that would not result in a finalized solution. A finalized 
solution would be to provide valid results using all the information provided and 


being able to make its own decisions on which information to consider. 


6.3 Further Results 


Error! Reference source not found. shows the numeric values resulting 
from all the test cases. The column Expected shows the number of keywords 
expected for each of the syllabi for that job description. The total columns display 
how many keywords Curtus was able to find. The range columns show how 
different the results were from each of the expected values (human evaluation). 


The lower the range is the closer the results were to the expected values. 


Table 3: Total Results 


Job Expected 150 75:75 150 75:75 
total total range range 
JavaDev 1404 FOZ ASTI 592 423 
Info Tech = 1237 1422 1186 385 Cir 
Simulation 1298 1620 1161 490 297 
Cyber Sec 1348 V2 A2sS on DAT 279 
Web Dev 948 10103. £96") 250 257 


Game Dev 1362 1965 1824 731 626 
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Appendix B has all the graphs providing visual results for all of the individual 
values that lead to the results of the columns in Error! Reference source not 


found.. Table 4 gives information on each of the details in Appendix B. 
Table 4: Short Summary 


Appendix Display Blue Dots — Expected Matches 
Red Dots — Matches under 150:0 Keywords 
B1 Green Dots — Matches under 75:75 Keywords 
Description All six tables under this section are pretty simple in 
how to interpret. The goal of the program was to get 
the results as close to the blue dots as possible 
Appendix Display Blue Dots — Expected Matches 
Red Dots — Matches under 150:0 Keywords 
B2 Green Dots — Matches under 75:75 Keywords 
Description The First Graph shows the average difference of each 
of the graphs from the expected results. The closer the 
(Graph 1) graph is to 0 the better the results. The green dots are 
the minimum of the two graphs making them the better 
results of the two 
Description This graph shows the plotted results of Table 2 above. 
Same as the graphs from Appendix B1, results are 
(Graph 2) considered better the closer they are to the blue dots. 


6.4 Conclusion 


Curtus is a tool aimed to map job descriptions to course syllabi and worked 
to provide students with classes that they can take for a specific job with high 
confidence. With the constantly changing work environment, this tool provides a 
method of alleviating the difficulty of keeping up. Curtus provides a solution to 
this problem and even helped in other ways through the research process 
leading up to the working prototype. It revealed things such as what makes a 
good syllabus, as well as providing other means of research in the field of Natural 


Language Processing. Curtus leaves openings for more research on it. One 


location being that it does not currently have the right metric to most effectively 
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order the courses from best to just good. Areas like this help to provide a means 


of improving a system has already been considered a success. 


6.5 Future Work 


Improvement of keyword searching can be investigated. This step would 
aim to give a much more thorough evaluation of how much a syllabus maps to a 
job description. One way to attempt this can be by applying weights to words 
using bigrams. Bigrams can be used as a tool to find how important a word might 
be. In a job description, the first iteration might find the words Python and Java 
and consider them of equal weight when it comes to just those keywords alone. 
Bigrams, however, can make a difference by possibly introducing more 
information about it as it considers associated modifiers like “Requires Java” and 
“Python System.” This puts the words in a different category where Python would 


be considered as just being a background while Java is a mandatory requirement 


for the job. Accordingly, syllabi that have the word Java would have more weight. 
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Appendix A: Scripts 


Appendix A1: Sorting Algorithm 


def mergeSort(Ist): 
if len(Ist)>1: 
center = len(Ist)//2 
left = Ist[:center] 
right = Ist[center:] 
mergeSort(left) 
mergeSort(right) 
i=0 
j=0 
k=0 
while i < len(left) and j < len(right): 
if left[i] < rightfj]: 
Ist[k]=left[i] 
i=i+1 
else: 
Ist{k]=right[j] 
sit 
k=k+1 


while i < len(left): 


Ist[k]=left[i] 
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i=i+1 
k=k+1 
while j < len(right): 
Ist[k]=right[j] 
jit 


k=k+1 


Appendix A2: Keyword Comparison 


def compareLists(jobDescriptionKeys, syllabusKeys, show ): 
shared = [] 
count = 0 
a=0 
b=0 
while(a < len(jobDescriptionKeys) and b < len(syllabusKeys)): 
if(jobDescriptionKeys[a].lower() < syllabusKeys[b].lower()): 
ait=1 
elif(jobDescriptionKeys[a].lower() > syllabusKeys[b].lower()): 
b += 1 
elif(jobDescriptionKeys[a].lower() == syllabusKeys[b].lower()): 
count += 1 
shared.append(syllabusKeysj[b]) 


if(show == True): 


print(syllabusKeys[b]) 





at=1 
b += 1 


return shared 


Appendix A3: Supported Files 


def getTextFromFile(fileName): 
content = "" 
mySwitch = getFileT ype(fileName) 
#for txt file types 
if(mySwitch == "txt"): 
f = open(fileName, "r") 
content = f.read() 
f.close() 
#for doc file types 
elif(mySwitch == "doc"): 
content = "" 
#for docx file types 
elif(mySwitch == "docx"): 
doc = docx.Document(fileName) 
fullText = [] 
for para in doc.paragraphs: 


fullText.append(para.text) 


content = '\n' join(fullText) 


36 





37 


#for html file types 
elif(mySwitch == "htmi"): 
f=codecs.open(fileName, 'r’, 'utf-8') 
document= BeautifulSoup(f.read(), features="Ixml").get_text() 
content = document 
#for pdf file types 
elif(mySwitch == "pdf"): 
pdfFileObject = open(fileName, 'rb’) 
pdfReader = PyPDF2.PdfFileReader(pdfFileObject) 
count = pdfReader.numPages 
fullText = [] 
for i in range(count): 
page = pdfReader.getPage(i) 
fullT ext.append(page.extractT ext()) 
content = ‘\n' join(fullText) 
test = ".join(fullText) 
pdfF ileObject.close() 
if(len(test)==0): 
from tika import parser 
parsed = parser.from_file(fileName, xmIContent=True) 
tree = ET.fromstring(parsed["content"]) 


content = ET.tostring(tree, encoding='utf8', method='text’) 


else: 





38 


print("No matching file type found for: "+mySwitch) 


return content 


Appendix A4: Keyword Extraction 


def extractTokens(text): 
try: 
tokens = word_tokenize(text) 
except: 
print("Error 101") 
input("Press enter to exit") 
sys.exit() 
lemmatizer = WordNetLemmatizer() 
bonuses = [lemmatizer.lemmatize(token) for token in tokens] 
stopwordList = stopwords.words(‘english’) 
extrabonus = [bonus for bonus in bonuses if bonus not in stopwordList and 
bonus.lower() not in otherWords] 
noSym = [token for token in extrabonus if token != token.upper()] 


return noSym 


def getKeywords(text): 
Ist = extractTokens(text) 


fdist1 = FreqDist(Ist) 


keys = KEYWORDS//2 





39 


most = fdist1.most_common(keys) 
least = fdist1.most_common()[-keys:] 


result = most + list(set(least)-set(most)) 


return result 
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Match Comparison Web Dev 
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Figure 13: Keyword Matches Web Developer 
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Appendix B2 
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Appendix C: Execution 75:75 Output 


Evaluating File [100.00%] : DONE: ./JobDescriptions/javaDev.txt 
"CPSC1301_WangS_$0814_Fall2018.docx’, 20) 

{*CPSC1302_ShushaneR_80821 Fall2018.pdf£", 20) 

("CPSC6105 YangJ_60928 Fall2018.docx"’, 20) 

("CPSC6105_YangJ_$1041 Fall2018.docx’", 20) 















224 | communication 222 ! 


sh? including 


authorized 


78 assistance 28 | 











Evaluating File [100.00%] : DONE: ./JobDescriptions/infoTech.txt 
(°CPSC6105_YangJ_80928 Fall2018.docx", 15) 
"CPSC6105_YangJ_81041_Fall201&.docx’, 15) 


390 computer 288 { probl 





256 | team 254 | system sok ! 




















| instruction 1 | I 
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Evaluating File [100.002] 
("CPSC6105_YangJ_80928_Fal 


bo 
bo 
o 
= 
co 
Qa. 
ra 

9 

” 

bt 
a 






| work 293 | computer 288 | requi 284 security 760 














bring —E evaluation ao so | control 23 } active 22 











Evaluating File [100.00%] : DONE: ./JobDescriptions/threatAnalyst.txt 
{*CPSC6105_YangJ_80928 Fall2018.docx', 24) 
('CPSC6105_YangJ_81041 Fall2018.docx", 24) 





| work 293 | need 790 ! also 372 ! 





virtual 


various 
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Evaluating File [100.00%] : DONE: -./JobDescriptions/webDeveloper.txt 
(*CPSC1105_ NguyenT_@0702_Fal12018-pdf£*, 12) 

{*CPSC1301L-WangS_80818 Fall2018.docx", 12) 
{(*CPSC1301L_FleenorH_§0817_Fall2018- html’, 12) 

{*CPSC1301 FleenorH_80813_Fall2018-html", 12) 
({"CPSC3131_Lee, ¥_80904 Fali2018.docx', 12) 

(*CPSCS157U_YangJ_81040 ¥Fall12018.docx", 12) 

("CPSCS157U_YangJ_82214 Fall2018.docx"’, 12) 





"CPSC6105_YangJ_80928 Fall2018.docx", 12) 
("CPSC610S_ YangJ_81041 Fall2018.docx', 12) 





| work 293 | problem % | also 4 | project s | must :53 i 









Evaluating File [100.008] 
(*CPSC4000 SummersW_80908 Fal 


28) 


experience 


solving 
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Keywords Being Used: 150 
File Name | Java Dev } Info Tech | Simulation | Cyber Sec 






i Game Dev I 


























CPSC1105_Berrios-RolonM_80718_Fali2018.docx j 13/150 | 15/150 { 6/150 ' 12/150 9/150 | 147150 ! 
-->Words In File 1 I | 761 I i 7él | 7éL | 
Weight of Result i H | 0.766% i | 1.183% 1 1.848% | 
>Calculated Strength I ' | <3.152> I I { <17.173> I 





CPSC1105_BowmanJ_80723_Fall2018.docx ! 

ords In File 1 

Weight of Result | 1.706% 

Calculated Strength | <14.785> 

'SC1105_BrumbeughE_80710_¥Fal12018 doc: 1 13/150 

Words In File 1 754 
! 
1 


13/150 


7€2 















Weight of Result 1.724% 
Calc 14.94 





lated Strength 









CPSC1105_BrumbaughE_80715_Fall2018.docx 1 13/150 

~->Words In File | 754 
Weight of Result 1 1.7248 
>Calculated Strength | <14.941> 










CPSC1105_CanedoJJ_80€59_Fa112018 pdt 1 14/150 ! 12/150 ' 
ords In File | 921 1 $22 ! 
Weight of Result } 1.82% 1 1.3038 1 

! ! 





Calculated Strength 















CPSC1105_CanedoJJ_80701_Fal12018 pdt | 14/150 | 12/150 l 
Words In File } 921 | $21 ! 
i 
} 








Weight of Result I | 1.303% 
Calculated Strength | ! <10.42 











CPSC1105_HuppJ_80705_¥al12018 .docx 1 13/150 i 15/150 i 14/150 i} 
-->Words In File | 76€ 1 7€€ ' 766 i 
>Weight of Result i 1.697% i 1.9588 1 1.8288 | 
>Calculated Strength |! <14.707 1 <19.58> i 17.0€1 {1 <12.057> 








CPSC1105_NguyenT_807 | 15/150 | 17/150 1 10/1506 | 12/150 | 12/150 | 1€é/150 1 
Words In File | 893 1s ! 893 1 893 ! 893 1 8s3 i SD:S.07 
Weight of Result | 1.68% i 1.9048 } 1.128 ! 1.344% | 1.3448 1 1.7928 1 
Calculated Strength | <16.8 ! Ee | <7.4€7> <1 | I | <14.411> 






CPSC1105_SellersC_80704_Fall12018_ pdt 13/150 i1/150 











! I 1 H t i I 
Words In File | 930 1 $30 1 936 { | $30 | 930 | SD:2.048 
Weight of Result | 1.398% 1 1.29% 1 1.1838 1 1.183% | 1.183% | 1.5058 i 
>Calculated Strength I 1 32> 1 <8.675 i <8.€75> | <8.€75> | <14.047> | <10.418> 








CPSC1105_SmithA_80703_¥Fal1201¢ pdt 























Words In File 
->Weight of Result 
-->Calculated Strength 

















_Wang¥_80713_Fall. : | ovis 1 3/150 1 0/150 
Words In File 1 144 1 144 } 144 

->Weight of Result | 0.08% 1 2.083% 1 0.0% 0.€94% 
Calculated Strength | <0.0> | <4.1€€> | <0.0> 














CPSC110S_WangY_81235_Fal12018 pdt ! 1 3/150 | 0/150 1 17150 
Words In File | 1 144 1 144 | 144 
->Weight of Result | ! | 0.08% | 0.694% 

I I | | 





->Calculated Strength 





CPSC1301L-WangS_80818_Fall2016.docx | 15/150 { 12/150 | 12/150 
+->Words In File 1 €71 1 671 1 671 

>Weight of Result | 2.235% | 1.7888 1 1.788% 

alculated Strength | <22.35> | <14.304> | <14.304 

































CPSC1301L_AngelopoulouA_§1215_Fal12018.docx | 18/150 15/150 14/150 1 
fords In File 1 755 738 755 I 
eight of Resuit 1 2.2 1 

Calculated Strength 7 | 





10/150 
735 


CPSC1301L_Angelopouloul_| 

f 

| 1.258% 
! 


i 
Words In File | 
Weight of Result | 
Calculated Strength | 








CPSC1301L_CarrollH_8081€_¥Fa112018.docx | 15/150 | 13/150 | 13/150 
-->Words In File i 758 1 755 | 755 
eight of Result 3 ! 7 | 1.722% 
1 | 


Calculated Strength 


CPSC1301L_FleenorH_80817_¥Fali201¢e. 
-->Words In File 

-->Weight of Result 

-->Calculated Strength 





13/150 


787 | 757 





CPSC1301L_ZhouY_8081S_Fali2018.docxz | 
Words In File 1 

eight of Result | 
Calculated Strength i 














CPSC1301L_ZhouwY_81215_Fali2¢ i I 
->Words In File ‘ ! 
-->Weight of Result {| 1.982% | 1.717% 
-->Calculated Strength I ! 


































CPSC1301L_ZhouY_82573_¥all2018.docx 13/150 i i 6/150 | 20/150 i 
-->Words In File | 757 ) 757 | SD:8.8596 
eight of Result 1 | 2.642% i 
Calculated Strength 1 1 <35. { <18 












CPSC1301_AngelopoulouA 81216 _Fall2018.rdf | 1 11/150 1 8/150 } 21/150 I 
ords In File I 1 995 1 39s 1 995 | SD:8.1€3 

Weight of Result i | 1.10€8% 1 0.8048 1 2.1118 I 
alculated Strength | | <8.111> | <4.288> | <29.554> | <13.€46> 


16/150 ! 

>Words In File | $95 i 
eight of Result | 1.608% | 1.4078 

| <17.152> | <13.132> 


CPSC1301_AngelopoulouA_ 82574 _| 


























CPSC1301_Carrol1H_80809_Fal. | i 11/150 ! { 12/150 ' 
-->Words In File I 1 9S6€ 1 | 99€ | SD:8.Ss79 
Weight of Result I | 1.104% i 1 1.205% 
i ! 1 





CPSC1301_FleenorH_20813_Fall2018 html i 


lords In File ! 
Weight of Result 







$D:19.41 



















alculated Strength 20.989> 
! 1 I | ! f | 
I 1 1 79 | | 796 | $D:12.€15 
eight of Result i 1 1.755% 1 1.508% | | 2.8888 1 
>Calculated Strength ! | <1€.417 | <12.0€4 ! | <44.298 { 











CPSC1301_ ZhouY_80615_¥al12018. | 1 15/150 | e@/1 ' | 

-->Words In File i i 75€ | 756€ | 756 | 75€ 1 SD:8.90S 
Weight of Result | 1 1.72 | 1.984% | 1.0588 | 2.6468 i 
Calculated Strength | { <14.507 | <19.84 | <S.€43> | <35.28 } <18.403 





CPSC1201_ZhouY_81214_Fall2018.docx | 
-->Words In File | 
! 
! 







Weight of Result 
alculated Strength 


7 






1301_ZhouY_! | 15/7150 

Words In File 1 75€ 
leight of Result 1 1.9848 

>Calculated Strength | 





_Fall2018.docx 














CPSC1302_RayL 80820 _Fal12018 .docx | 13/156 | 14/150 
ords In le | 744 | 744 
“Height of Result 1 1.747% | 1.882% 

Calculated Strength | <15.141> | <17.565 
















CPSC1302_ShushaneR_80821 Fall2015.pdt | 20/150 i 12/150 1 13/150 | 10/150 1 8/150 | 18/150 | 

-->Words In File 1 91€ 1 916 1 91€ 1 916 | 916 | 916 | SD:8.811 

-->Weight of Result 1! 2.183% 1 1.31% 1 1.419% | 1.092% | 0.873% i 1.968% 1 
>Calculated Strength | 0 | <10.48> 1 <12.298> I | <4.€86 | <23.58> | <14.S67> 














CPSC2105_LeeS_80822_Fall12018.docx | { 117150 i i 127150 I | 
-->Words In File | 833 i 833 i i 833 ! | SD:€.527 

Weight of Result | 9218 | 1.321% | | 1.441% i t 

' ! H i i 








Calculated Strength <11.528> 




























CPSC210S_RogersN_80823_Fall2019.pdt 1 12/150 | i i 15/150 1 7/1 ! | 
Words In File | 958 | 958 i | 958 1 988 1 988 | SD:€.71 

eight of Result 1 1.253% | 1.253% H | 1.566% 1 0.731% ! i] 

| ! ! ! I< | ! 





-Calculated Strength 











CPSC2106_PekerY_€0824 ¥al12018.docx I { 13/150 i 1 18/150 ! 
“Words In File ! | 697 | | €97 | SD:10.453 

leight of Result i | 1.865% 1 | 2.582% | 

i ! : ! ! ! 


"30.984> 





-Calculated Strength 








CPSC210€_RayL_ 80863_¥Fall2018.doc ! 1 0/150 
-->Words In File | 10 
Weight of Result i 1 0% 
I ' 














CPSC2108_HodhodR_80887_Fal12018.do 1 11/150 1 | 10/150 | 7/150 
Words In File 1 870 ! | 870 | 870 
Weight of Result 1 1.149% } 0.805% 

! <3.757 





1 <3.757> 





CPSC2108_PerezA_80855_F Ai | 14/150 11/150 I 

-->Words In File | $30 830 ! 
I 7 
I 










Weight of Result 1.€87% 
‘alculated Strength 

















SC2125_KhanS_#085¢ | 8/15¢ | 8/150 | 8/150 
-->Words In File | 713 | 71S {719 | 719 
>Weight of Result ] 1.113% i 1.113% 1 
1 ! ! 


Calculated Strength 


















(C310€_BarkerM_80897_Fal12018 pdf | 14/1s0 
Words In File | $979 
I 
i 





eight of Result 1.43% 
Calculated Streng 











CPSC3108_PekerY_80896_Fali2018.docx | 15/150 1 
lords In le | €5¢€ ! 
>Weight of Result | 2.287% | 

i 





Calculated Strength 
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CPSC3111_ RogersN_80899_F2112018 pdt | 1s/1s0 i ' 18/150 { 11/150 { 24/150 
Words In File i 560 ' ! ! s€o { seo | S5€0 | SD:16.258 
Weight of Result | 3.393% | I | 3.2148 1 1.9646 {| 4.266% 
>Calculated Strength | <42.978> | <34.408> { { <38.S€6> | <14.403 | <€8.57€> | <38.235 










s 





SC3116_RogersH_ i 
Words In File 1 
Weight of Result if 

i 





0 Fall2018 pdt 17/150 | 18/150 


510 {| S10 
























Calculated Strength 923 
;_ CarrollH 80901 ¥al12018 pdt 1 13/150 12/150 | 

>Words In File i $90 | 1 990 $30 $90 | S$D:10.223 

Weight of Result } 1.313% ! ! 

Calculated Strength | <11.375> ! i <14.14 











CPSC3125_CarrollH_80902_¥Fal12018 pdf i } 15/150 
-->Words In File if | 950 

Weight of Result 1 1.313% {| 1.515% 
-->Calculated Strength 1 <11.379> | <15.15> i 









CPSC3131_Lee, Y_80903_Fal12018.docx | 13/150 | 12/150 | 17/180 ' ! 1 23/150 
-->Words In File | €05 1 €05 ! €0S 1 | | €0S 1 SD:15.2€ 
-->Weight of Result | 2.149% i 1.983% 1 2.81% H i | 3.8028 

Calculated Strength <18.€25> ! ! ' ! > | <28.302> 





CPSC3131_Lee, ¥_80904_| | 14/180 ! 
Words In File | €€L t 
Weight of Result |} 2.118% 
Calculated Strength |! <19.768> 




































CPSC31€S_Hupp_8090€_¥all2018.docx | 14/180 | 1giso 1 

-->Words In File | 782 1 782 | SD:8.431 
Weight of Result ! 1.75% 1 2.43% | 
Calculated Strength | <1€.707> | <30.78> | <14.977 








CPSC317S_ObandoR_803907 | 24/150 I 
Words In File | 751 1 SD:14.427 
Weight of Result | 3.1968 i 
>Calculated Strength | <S1.136> i <19.723> 


a 


CPSC3415_ObandoR_8210¢_ { 
Words In File i 
Weight of Result | 
-->Calculated Strength 1 
CPSC3555_FleenorH_80€37_! | 
>Words In File I 
Weight of Result i 
Calculated Strength 1 





<] 
as 
, 
p, 
b 
is 
8 
= 
® 
r 
z 






























CPSC4000_SummersW_80908_¥Fal12018.docx } 15/150 { 18/150 } 147150 i issiso 
Words In File { 301 | 301 { 301 1 301 
Weight of Result | 4.983% | $.98% | 4.6518 1 4.983% 
Calculated Strength i <49.83> i 1! <43.405> i <49.83 
CPSC4111_ObandoR_80509_Fal12018.docx } 14/150 | 11/150 | 9/150 1 i 1 
-->Words In File ! 764 i 764 i 764 ! i 7E4 | 764 1 SD:11.349 
*Height of Result | 1.832% | 1.44% ! 1788 1 | 6.916% | 2.749% 1 
| i ' t 





CPSC4121_ ZamsteinL_|! 
-->Words In File 

Weight of Result 
Calculated Strength 

















CPSC412S_ Smith 20911 _Fall2018 pdt | 0/150 | 07150 1 o71s0 1! 0/150 
-->Words In File | 168 | 168 1 1€8 1 168 
Weight of Result 1 0.08 1} 0.08 1 0.0% | 0.0% 
I< I 1 > 





Calculated Strength 










! { 12/150 | 11/150 { 1€/150 
-->Words In File ' | 843 | 843 | 843 | SD:4.9€7 
Weight of Result | i 1.423% | 1.305% | 1.8988 
t l | <20.245> 






























-->Calculated Strength 


CPSC420S_SummersW_80913_¥Fal12018.docx 

Words In File 
I 
! 








>Weight of Result 
Calculated Strength 








CPSC4505_PekerY_8051S_¥Fal12018-.docx | 14/150 1 12/150 | 10/150 
-->Words In File ! 1 €86 | €80 
-->Weight of Result | 2.055 1 1.765% 1} 1.471% 
-->Calculated Strength 1 <19.217 | <14.12> 1 <9.807> 





CPSC4€98_FleenorH_$2795_Fall 
-->Words In File 

-->Weight of Result 
Calculated Strength 
CPSCS5115U_LeeS_80518_Fall1201 
Words In le 

Weight of Result 
Calculated Strength 














CPSCS125U_ObandoR_80915_Fall. 
-->Words In File 

>Weight of Result 
>Calculated Strength 
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CPSCS127U_WangL_80520_¥a112018 {} 17/150 13/150 13/150 | 1 8/159 21/150 ! 

-->Hords In File 1 £88 | 588 588 | SD-:13.389 
eight of Result I 12 1 
Calculated Strength 1 < 1 -054> 








CPSCS13SU_WoolbrightD_80521_Fall2016.docx | 7/150 { 10/150 1 1 €/150 | €/150 1 i 
-->Hords In File i 487 {457 i | 487 | 487 i | SD:$.817 
-->Weight of Result | 1.5328 | 2.188% 1 { 1.313% | 1.313% i | 

i ! i 1 \ 


-->Calculated Strength 
CPSCS15SU_WangS_80524_¥Fal12018.docx 
-->Words In File 

—->Weight of Result ! 
-->Calculated Strength I 











11/150 12/150 
833 | 833 1 833 833 









CPSC: I 
Words In File ! 
eight of Result 1 
alculated Strength i 


1S7U_YangJ_81040_Fall2018. 14/150 


714 

















CPSCS157U_YangJ_82 -docx 1 16/150 | 14/150 
-->Words In File ! { 713 
-->HWeight of Result i i 1.964% 













CPSCE000_KhansS_8 
-->Words In File 
eight of Result 

Calculated Strength 
CPSCE1LOS_YangJ_8052 
-->Words In File 

-->Weight of Result 
alculated Strength 


Fall2018.pdt 













|_Fall12018.docx 





CPSC€105_YangJ_81041_Fal12018 .docx 
“fords In File 

-->Weight of Result 

alculated Strength 





























€106_HodhodR_805925_¥Fal12018 .docx i | 10/150 | 14/150 | 11/150 | 10/150 { 20/150 i 
-->Words In File | 642 ! 842 | 842 | 842 | 842 i | 
-->Weight of Result t i I | 1.3068 | 1.186% | i 
-->Calculated Strength ' i | <15.521> | <9.577> | 2 | | 
106_HodhodR 81042 _Fal12018.docx 1 17/150 | 10/15 | 14/150 1 11/150 | 10/150 i 20/150 | 
-->Words In File | 838 | 838 | 838 i 838 | 838 | 838 | sD:8.848 
“Weight of Result i 0258 Y ded | 1.671% { 1.313% | 1.193% {| 2.387% | 
I I i | 1 i 





-->Calculated Strength <18.556€ 














CPSCE107_AngelopoulouA_80930_Fal12018 pdt i 13/150 | 12/150 I 
-->Words In File | 955 i 959s i 
-->Weight of Result | 1.3568 } 1.251% 1 
-->Calculated Strength { <11.752 ij <10.008 1 
CPSC€107_AngelopoulouA_81043 Fall2016.pdf | 
-Words In File | 

| 

i 


13/150 
355 












Weight of Result 
leulated Strength 














CPSCE109_LeeS_80931_Fal12018 .docx i 
-->Words In File 1 
-->Weight of Result | 
-->Calculated Strength ! 






10$_LeeS_81044_Fall20le. 
ords In File 

Weight of Result 
-->Calculated Strength 










CPSCE118_PerezA_80932_Fal12012.docx i 17/150 | 1 21/150 1 
fords In File | 746 1 | 746 | SD:11.04 

Weight of Result ! 2 1 {| 2.815% 1 

! | ! ! 






lculated Strength 19.962 













10/150 


119_WangL_80933_Fal ! 
1 712 
i 
I 


I 
“Words In File | 
*Weight of Result 1 
-->Calculated Strength | 


1.404% 























CPSCE11S WangL_@104€_Fall2018.docx J 
-->Hords In File ! 
~->?Weight of Result i 
Calculated Strength I 








CPSCE12S_ChouchaneR_80534_Fal12018.pdt i ! 1 16/150 H 
-->Words In File | | 1 913 ! 
-->Weight of Result | | 1 1.7528 1 

| i 4 


-->Calculated Strength 
CPSCE | ChouchaneR_81047_¥Fal120i8-pdf | 14/150 
-->Words In File 1 $13 

eight of Result | 1.533% 
>Calculated Strength <14.308 



















CPSCE12€_PekerY_80535_Fall2018.docx I 4150 i i 13/150 1 20/ i 
-->Words In File | €84 i | 684 | €84 | SD:10.981 
of Result | | 1.901% (2.9: i 
d Strength | | 1 <38. I -452> 
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cCel2¢€_PekerY_61048 ¥F2112018.docx | 13/1590 | 13/150 €/150 {| 20/150 ! 

>Words In File 1 €84 i €e4 | €84 { SD:10.981 
leight of Result { 1.9018 1 0.8778 1 2.92 1 

>Calculated Strength 1 <1€.475> 1 > i <38. ! 











CPSC612S_ChouchaneR_8093¢€_¥al12018 pdt } 14/150 | 14/150 | 11/150 | 23/150 ! 
Words In File | 936 1 S3€ | S3€ | 936 | SD:10.154 
eight of Result 1 1.496% | 1.456% ! 1.175% ! I 
-->Calculated Strength | <13.96€3> | <13.56: H €17 ! { <15.515: 




















CPSC€125_ChouchaneR_81045 Fall2018 pdt | 14/150 | 12/150 { 14/150 } 11/150 | 11/150 ' j 
ords In File 1 93€ i 936 | 936 | 93€ 1 $36 ! | SD:10.154 
Weight of Result 1 1.49€% | 1.282% | 1.49€% 1 1.175% 1 1.17558 i | 
Calculated Strength | | | <13.5€3 | <8.€17> | <8.€17> | | <15.515> 















€13¢6_RayL _80937_Fall2018_docx | 18/150 | 14/150 | 11/150 
= ords In File | 441 { 441 { 441 
= leight of Result | 3.4018 1 3.175% | 2.494% 
Calculated Strength | <34.01> | <29.€33> | <18.289> 
CPSCE13€_RayL_ $1050 Fall2018.doc= 1 15/150 | 14/150 | 871s | ! 
-->Words In File | 441 | 441 ! 1 i 
Weight of Result | 3.4018 | 3.175% 1 1.8148 | 5.4428 | 
-->Calculated Strength 1 <34.01 | <29.€33. 1 <s. i <87.072 1 <39.858> 
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CPSCEé142_AngelopoulovA 80938 _Fal12018 pdt 1 7/150 | 7/150 | 10/150 { 13/150 
-->Words In File | 867 i | 867 | 867 SD:€.103 
Weight of Result 1 0.807% ! | 1.153% | 1.495% 
| 1 ' ! 


alculated Strength 


E148 AngelopoulouA_| i 
-->Words In File | 867 ! 

eight of Result | 0.8078 1 
>Calculated Strength i <3.76€> I 































CPSC6157_WanglL_80940_Fal12018. | 19/150 | 15/150 | 13/150 

-->Words In File | 735 | 739 | 7395 $D:11.75€ 
-->Weight of Result | 2.571% | 2.03% | 

-->Calculated Strength | <32.5€€> | <20.3> 1 €s¢ 











CPSC¢157_WangL_62199_¥al12018 docx i i 16/150 | 16/150 i 

Words In File | 735 | 739 | 739 ! 
Weight of Result 1 | 2.165% | 2.1658 hia Ee 

Calculated Strengt I | <23.093> | <23.093> 1 





SD:11.€6€7 
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PSC6177_PerezA_80541_. { 147150 | 15/150 | i ' | 17/150 i 
fords In File | 860 1 8€0 | 860 | 8€0 | 8€0 | 860 | SD:5.97 

eight of Result | 1.€28% | 1.7448 | 7 i ! ' | 

>Calculated Strength 1 <15.195 | <17.44> 1 | | t 1 

CPSC6177_PerezA_82200_Fal112018 docx 1 14/150 i | 11/180 {| 12/150 | 7/150 1 17/150 I 
-->Words In File | 8€0 I | 860 | 8€0 | 860 | 8€0 | SD:5.97 

Weight of Result | 1.628% I ee 1 | 0.814% 1 l 

! ! 1 - 1 ! i I 


-Calculated Streng’ 











































11/150 
934 


€178_Khan_Fall_2018-pdf ! 
-->Words In File | 
Weight of Result i 
-->Calculated Strength ! 





CPSCE899_PekerY_82205_Fall2018.docx 14/150 


| 10/150 
"Words In File | 680 

I 

! 


€80 





Weight of Result -0598 
>Calculated Strength <19.217 
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CPSCES8S_HodhodR_@24€S_¥Fall12018.docx | 1€/150 | 1é/1S6 | 15/150 7/ t | 
-->Words In File { 714 1 714 1 714 714 | 714 | SD:13.2€9 

Weight of Result | 2.241% | 2.2 1 2. | | 

i I I t i 


»Calculated Strength 
CPscesé5_KhanS_| 82_Fal12018 pdf ! 17/150 
-->Words In File | 34S | 345 
I 
1 









Weight of Result 2.579% | 4.87158 
Calculated Strength 


cCPSCes85_Wangl_& I 
Words In File ! 
Weight of Result ! 
Calculated Strength I 






13/150 
$37 


23_¥Fal12018 .docx 





CSMT , WangS_81052_Fal12016 .docx | 11/150 {| 14/15 

-->Words In File I 827 | 827 
“Weight of Result j 

-->Calculated Strength t 








HONS35S55_KhanS §1¢49_Fall2 I 
fords In File t 
eight of Result 1 
-Calculated S$ ngth | 























MISM412€ HodhodR_§1228_¥al12018.docx | 11/150 | 11/150 1 

Words In File | 3€1 ! 361 i 

-->Weight of Result 1 3.047% i 3.0478 if 
1 A 





Calculated Strength 
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WEIT2000_GarvinC_81329_Fal12018.docx | 8/150 | 11/150 | 12/ 
Words In File 1 618 | 818 1 818 s18 } 818 i 
Weight of Result | 0.978% | 1.345% 1 1.4€7% 1.711% {| 0.978% } 
alculated Strength 1 | <9.863> t <11.736: > 1 21€> i 








WEITSS00_Bhagyavait_81330_Fal12018 pdt 
-->Words In File 

Weight of Result 
-->Caleulated Strength 










oo 





| KhanS_81331_¥all2018.pdt 
Words In File 

Weight of Result 
-->Calculated Strength 

















WEIT4S20_ Bhagyavati_813 
-2Words In File 
Weight of Result 
alculated Strength 
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